UB-H: an unbalanced-hierarchical layer binary-wise construction method for high-dimensional data

نویسندگان

چکیده

Abstract Cloud computing, which is distributed, stored and managed, drawing attention as data generation storage volumes increase. In addition, research on green increases energy efficiency, also widely studied. An index constructed to retrieve huge dataset efficiently, the layer-based indexing methods are used for efficient query processing. These construct a list of layers, so that only one layer required information retrieval instead entire dataset. The existing layers using convex hull algorithm. However, execution time this method very high, especially in large, high-dimensional datasets. Furthermore, if total number increases, processing resulting efficient, but slow, paper, we propose an unbalanced-hierarchical method, hierarchically divides dimensions input increase reduce building time. We demonstrate proposed procedure significantly reduces time, compared through various experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Binary Embedding for High-Dimensional Data

Binary embedding of high-dimensional data requires long codes to preserve the discriminative power of the input space. Traditional binary coding methods often suffer from very high computation and storage costs in such a scenario. To address this problem, we propose two solutions which improve over existing approaches. The first method, Bilinear Binary Embedding (BBE), converts highdimensional ...

متن کامل

Hierarchical Binary Histograms for Summarizing Multi-Dimensional Data

The need to compress data into synopses of summarized information often arises in many application scenarios, where the aim is to retrieve aggregate data efficiently, possibly trading off the computational efficiency with the accuracy of the estimation. A widely used approach for summarizing multi-dimensional data is the histogram-based representation scheme, which consists in partitioning the ...

متن کامل

Methods for regression analysis in high-dimensional data

By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...

متن کامل

PCS: An Efficient Clustering Method for High-Dimensional Data

Clustering algorithms play an important role in data analysis and information retrieval. How to obtain a clustering for a large set of highdimensional data suitable for database applications remains a challenge. We devise in this paper a set-theoretic clustering method called PCS (Pairwise Consensus Scheme) for high-dimensional data. Given a large set of d-dimensional data, PCS first constructs...

متن کامل

An $\ell_1$-Method for Clustering High-Dimensional Data

In general, the clustering problem is NP–hard, and global optimality cannot be established for non–trivial instances. For high–dimensional data, distance–based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high–dimensional spaces. We propose a distance–based iterative method for clustering data in very high–dimensional space, usin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computing

سال: 2021

ISSN: ['0010-485X', '1436-5057']

DOI: https://doi.org/10.1007/s00607-020-00871-0